Skip to content

Normalize tar extraction permissions for inaccessible archives#284

Merged
bjk7119 merged 1 commit into
mainfrom
permi
Jun 17, 2026
Merged

Normalize tar extraction permissions for inaccessible archives#284
bjk7119 merged 1 commit into
mainfrom
permi

Conversation

@bjk7119

@bjk7119 bjk7119 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • Bug Fixes
    • Improved TAR extraction: added a safer extraction flow that normalizes permissions (directories to 755, files to 644), recovers from permission errors by fixing and retrying extraction, and applied the fix across all TAR handling paths to prevent archives producing inaccessible files.

@bjk7119 bjk7119 requested a review from dd-jy June 12, 2026 04:23
@bjk7119 bjk7119 self-assigned this Jun 12, 2026
@bjk7119 bjk7119 added the chore [PR/Issue] Refactoring, maintenance the code label Jun 12, 2026
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@bjk7119, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 41 minutes and 15 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9c5d14a4-b03e-4941-97af-4ad8d54dad73

📥 Commits

Reviewing files that changed from the base of the PR and between 9368053 and f0d1150.

📒 Files selected for processing (1)
  • src/fosslight_util/download.py
📝 Walkthrough

Walkthrough

Adds helpers to normalize permissions on extracted TAR members and a safe extraction wrapper that retries on PermissionError; replaces direct tarfile.extractall calls at three TAR extraction call sites in download utilities.

Changes

TAR Extraction Safety

Layer / File(s) Summary
Safe TAR extraction helpers
src/fosslight_util/download.py
Adds _fix_extracted_permissions(path) and _tar_extractall_safe(fname, open_mode, extract_path) to normalize extracted directory/file modes and retry extraction on PermissionError.
Integration into extraction call sites
src/fosslight_util/download.py
Replaces direct tarfile.extractall usage with _tar_extractall_safe for top-level .crate extraction, .tar.gz/.tgz and .tar.xz/.tar branches in extract_compressed_file, and the magic-bytes TAR detection path.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: normalizing tar extraction permissions for archives that are otherwise inaccessible, which aligns with the core implementation adding _tar_extractall_safe and _fix_extracted_permissions helpers.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch permi

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/fosslight_util/download.py (1)

1137-1139: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Inconsistent .crate extraction: not using safe helper.

This .crate extraction path still uses raw tarfile.extractall() without the safe helper, while other tar formats (.tar.gz, .tar.xz, magic-bytes detection) were updated to use _tar_extractall_safe. The same permission issues could affect .crate files downloaded directly (not via RPM/SRPM).

♻️ Proposed fix
             elif fname.endswith(".crate"):
-                with contextlib.closing(tarfile.open(fname, "r:gz")) as t:
-                    t.extractall(path=extract_path)
+                _tar_extractall_safe(fname, "r:gz", extract_path)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fosslight_util/download.py` around lines 1137 - 1139, The .crate branch
currently calls tarfile.extractall directly which is inconsistent and unsafe;
change the block that checks fname.endswith(".crate") to open the tar with
contextlib.closing(tarfile.open(fname, "r:gz")) and call the existing helper
_tar_extractall_safe(t, extract_path) instead of t.extractall, preserving use of
the same tarfile handle variable (t) and extract_path so extraction uses the
safe permissions/path checks like the other branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fosslight_util/download.py`:
- Around line 1049-1081: _tar_extractall_safe currently calls tarfile.extractall
before validating member paths, allowing path traversal; instead, iterate
tf.getmembers() and validate each TarInfo's resolved extraction path is within
extract_path (reject or skip any unsafe member) before calling tf.extractall or
extracting members manually. Update both the initial extraction path and the
retry branch so you compute safe_members by checking
os.path.realpath(os.path.join(extract_path, member.name)) startswith the
realpath of extract_path, then pass only safe_members to tf.extractall (or
extract members individually), and keep using _member_extracted_paths and
_fix_extracted_permissions afterward; ensure the same safety check is applied
when reopening the tar for the PermissionError retry.

---

Outside diff comments:
In `@src/fosslight_util/download.py`:
- Around line 1137-1139: The .crate branch currently calls tarfile.extractall
directly which is inconsistent and unsafe; change the block that checks
fname.endswith(".crate") to open the tar with
contextlib.closing(tarfile.open(fname, "r:gz")) and call the existing helper
_tar_extractall_safe(t, extract_path) instead of t.extractall, preserving use of
the same tarfile handle variable (t) and extract_path so extraction uses the
safe permissions/path checks like the other branches.
🪄 Autofix (Beta)

✅ Autofix completed


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3d951aa2-e98a-4974-9297-56d688949058

📥 Commits

Reviewing files that changed from the base of the PR and between a71ee67 and 5513987.

📒 Files selected for processing (1)
  • src/fosslight_util/download.py

Comment thread src/fosslight_util/download.py Outdated
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 1 file(s) based on 1 unresolved review comment.

Files modified:

  • src/fosslight_util/download.py

Commit: 38dfc5f5e52c53acd52cf85b770ab4131fac8c11

The changes have been pushed to the permi branch.

Time taken: 2m 4s

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/fosslight_util/download.py (1)

1151-1152: ⚠️ Potential issue | 🟠 Major

Fix .crate extraction to go through _tar_extractall_safe
src/fosslight_util/download.py’s .crate branch uses raw t.extractall(...), bypassing _tar_extractall_safe’s safe member filtering and PermissionError/retry handling.

🔧 Suggested integration fix
             elif fname.endswith(".crate"):
-                with contextlib.closing(tarfile.open(fname, "r:gz")) as t:
-                    t.extractall(path=extract_path)
+                _tar_extractall_safe(fname, "r:gz", extract_path)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fosslight_util/download.py` around lines 1151 - 1152, The .crate
extraction branch currently opens the tarfile with
contextlib.closing(tarfile.open(...)) and calls t.extractall(path=extract_path),
bypassing the safe extractor; replace that raw extractall call with a call to
the existing _tar_extractall_safe helper (passing the opened TarFile object and
extract_path) so member filtering, PermissionError handling and retries are
applied; ensure you still open the tar with the same context (the symbol to
change is the block using tarfile.open and t.extractall and the helper to call
is _tar_extractall_safe).
♻️ Duplicate comments (1)
src/fosslight_util/download.py (1)

1055-1061: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Containment check is bypassable; prefix matching still allows traversal escapes.

Line 1059 and Line 1081 use member_path.startswith(real_extract). This is unsafe for prefix-collision paths (e.g. /tmp/outside passing for /tmp/out). Also, link members (issym / islnk) are not target-validated, so link-based escapes can still bypass directory containment.

🔧 Suggested hardening diff
@@ def _tar_extractall_safe(fname: str, open_mode: str, extract_path: str) -> None:
-            real_extract = os.path.realpath(extract_path)
+            real_extract = os.path.realpath(extract_path)
             safe_members = []
             for member in members:
-                member_path = os.path.realpath(os.path.join(extract_path, member.name))
-                if member_path.startswith(real_extract):
-                    safe_members.append(member)
+                member_path = os.path.realpath(os.path.join(real_extract, member.name))
+                try:
+                    if os.path.commonpath([member_path, real_extract]) != real_extract:
+                        continue
+                except ValueError:
+                    continue
+
+                if member.issym() or member.islnk():
+                    link_target = os.path.realpath(
+                        os.path.join(os.path.dirname(member_path), member.linkname)
+                    )
+                    try:
+                        if os.path.commonpath([link_target, real_extract]) != real_extract:
+                            continue
+                    except ValueError:
+                        continue
+
+                safe_members.append(member)
@@
-            real_extract = os.path.realpath(extract_path)
+            real_extract = os.path.realpath(extract_path)
             safe_retry_members = []
             for m in retry_members:
-                member_path = os.path.realpath(os.path.join(extract_path, m.name))
-                if member_path.startswith(real_extract):
+                member_path = os.path.realpath(os.path.join(real_extract, m.name))
+                try:
+                    if os.path.commonpath([member_path, real_extract]) != real_extract:
+                        continue
+                except ValueError:
+                    continue
+                if m.issym() or m.islnk():
+                    link_target = os.path.realpath(
+                        os.path.join(os.path.dirname(member_path), m.linkname)
+                    )
+                    try:
+                        if os.path.commonpath([link_target, real_extract]) != real_extract:
+                            continue
+                    except ValueError:
+                        continue
                     if m.isdir():
                         m.mode = m.mode | 0o755
                     elif m.isfile():
                         m.mode = m.mode | 0o644
                     safe_retry_members.append(m)
#!/bin/bash
set -euo pipefail

python3 - <<'PY'
import os
extract = "/tmp/out"
for member in ["../outside/pwn", "sub/../../outside/pwn"]:
    real_extract = os.path.realpath(extract)
    member_path = os.path.realpath(os.path.join(extract, member))
    print(
        member,
        "resolved=", member_path,
        "startswith=", member_path.startswith(real_extract),
        "commonpath=", os.path.commonpath([member_path, real_extract]) == real_extract
    )
PY

rg -n "startswith\\(real_extract\\)|issym\\(|islnk\\(" src/fosslight_util/download.py -C2

Also applies to: 1077-1087

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fosslight_util/download.py` around lines 1055 - 1061, The extraction loop
currently uses member_path.startswith(real_extract) which is bypassable; replace
that containment check with os.path.commonpath([real_extract, member_path]) ==
real_extract to robustly ensure the member resolves inside the extract
directory, and additionally handle link members (TarInfo.issym and
TarInfo.islnk) by resolving their link targets (using
os.path.realpath(os.path.join(extract_path, member.linkname)) or similar) and
only include the member if the resolved link target also satisfies the same
commonpath containment check; skip any member (including directories, files,
symlinks, hardlinks) that fails these checks before calling tf.extractall with
the filtered safe_members.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/fosslight_util/download.py`:
- Around line 1151-1152: The .crate extraction branch currently opens the
tarfile with contextlib.closing(tarfile.open(...)) and calls
t.extractall(path=extract_path), bypassing the safe extractor; replace that raw
extractall call with a call to the existing _tar_extractall_safe helper (passing
the opened TarFile object and extract_path) so member filtering, PermissionError
handling and retries are applied; ensure you still open the tar with the same
context (the symbol to change is the block using tarfile.open and t.extractall
and the helper to call is _tar_extractall_safe).

---

Duplicate comments:
In `@src/fosslight_util/download.py`:
- Around line 1055-1061: The extraction loop currently uses
member_path.startswith(real_extract) which is bypassable; replace that
containment check with os.path.commonpath([real_extract, member_path]) ==
real_extract to robustly ensure the member resolves inside the extract
directory, and additionally handle link members (TarInfo.issym and
TarInfo.islnk) by resolving their link targets (using
os.path.realpath(os.path.join(extract_path, member.linkname)) or similar) and
only include the member if the resolved link target also satisfies the same
commonpath containment check; skip any member (including directories, files,
symlinks, hardlinks) that fails these checks before calling tf.extractall with
the filtered safe_members.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a2e6eced-9bc8-4dad-b6b4-3cd926882d8c

📥 Commits

Reviewing files that changed from the base of the PR and between 5513987 and 38dfc5f.

📒 Files selected for processing (1)
  • src/fosslight_util/download.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/fosslight_util/download.py (1)

1113-1115: ⚠️ Potential issue | 🟠 Major

Route standalone .crate files through the TAR helper (_tar_extractall_safe).

In src/fosslight_util/download.py (around lines 1113-1115), the .crate branch still calls tarfile.extractall() directly, bypassing _tar_extractall_safe’s PermissionError recovery (_fix_extracted_permissions() + retry) and its member-mode normalization after a permission failure.

Minimal fix
             elif fname.endswith(".crate"):
-                with contextlib.closing(tarfile.open(fname, "r:gz")) as t:
-                    t.extractall(path=extract_path)
+                _tar_extractall_safe(fname, "r:gz", extract_path)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fosslight_util/download.py` around lines 1113 - 1115, The .crate branch
currently opens the tar and calls t.extractall() directly; change it to route
through the existing helper _tar_extractall_safe so PermissionError recovery and
member-mode normalization run. Replace the inner call to
t.extractall(path=extract_path) with a call to _tar_extractall_safe (e.g.,
_tar_extractall_safe(t, extract_path)) while keeping the
contextlib.closing(tarfile.open(...)) wrapper, so _fix_extracted_permissions()
and the retry logic in _tar_extractall_safe are used.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fosslight_util/download.py`:
- Around line 1021-1033: The _fix_extracted_permissions function currently calls
os.stat() and os.chmod() on every entry found by os.walk(), which follows
symlinks and may alter targets outside extract_path; modify the logic to skip
symlinks by checking with os.path.islink() (or using os.lstat()) for each
directory and file before calling os.stat()/os.chmod() so that any symlink
entries are ignored and only regular filesystem objects are permission-changed;
update the loops handling dirs and files (references:
_fix_extracted_permissions, os.walk, os.chmod, os.stat) to perform this symlink
check and continue without attempting chmod on symlinks.

---

Outside diff comments:
In `@src/fosslight_util/download.py`:
- Around line 1113-1115: The .crate branch currently opens the tar and calls
t.extractall() directly; change it to route through the existing helper
_tar_extractall_safe so PermissionError recovery and member-mode normalization
run. Replace the inner call to t.extractall(path=extract_path) with a call to
_tar_extractall_safe (e.g., _tar_extractall_safe(t, extract_path)) while keeping
the contextlib.closing(tarfile.open(...)) wrapper, so
_fix_extracted_permissions() and the retry logic in _tar_extractall_safe are
used.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bd5aacb1-48d4-491d-9ea4-b174b935ea75

📥 Commits

Reviewing files that changed from the base of the PR and between 38dfc5f and 9368053.

📒 Files selected for processing (1)
  • src/fosslight_util/download.py

Comment thread src/fosslight_util/download.py
@bjk7119 bjk7119 merged commit 9e8e171 into main Jun 17, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore [PR/Issue] Refactoring, maintenance the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants